Run-time determination of application performance with low overhead impact on system performance

ABSTRACT

Techniques are disclosed for determining the run-time performance of an application executing on a computing system with low impact on the performance of the computing system. For example, a time series telemetry data stream is obtained for each of a plurality of key performance indicators during run-time execution of the application on a computing system having a given system configuration. One or more statistical features are extracted from each time series telemetry data stream. Model parameters of a machine learning performance score model are populated with values of the extracted statistical features. A run-time performance score of the application is then determined using the model parameters of the machine learning performance score model populated with the values of the extracted statistical features.

FIELD

The field relates generally to techniques for determining run-timeperformance of applications executing on a computing system.

BACKGROUND

Today, most computing systems (e.g., computers) can be configured usingconfiguration “knobs” that control various aspects of the computingsystems such as, e.g., the amount of memory to utilize for caches, howoften data is written to storage, etc. The configuration knob settingsfor a given computing system can have a significant effect on therun-time behavior of a given workload (e.g., application) being executedby the computing system. In this regard, various tools and techniqueshave been developed to determine the configuration knob settings thatwill optimize the performance of a given application. For example, oneconventional technique involves utilizing a benchmark application totest the run-time behavior of a specific application or workload forwhich the benchmark application is created. However, the use ofbenchmark applications for run-time performance testing of workloads isproblematic for various reasons.

For example, a benchmark application is compute intensive and adverselyimpacts machine performance during execution of the benchmarkapplication on the machine. During execution of the benchmarkingprocess, the target application being tested cannot be utilized by anend user. In addition, the execution of the benchmark application canadversely impact the performance of other processes or applications thatthe end user is currently running on the given machine. In this regard,end users or customers typically will not allow performance optimizationtools to perform background benchmark testing procedures that consume asignificant amount of compute power and storage bandwidth, which limitsthe ability of the performance optimization tools to measure run-timeperformance of the machine on a continual basis. Another disadvantageassociated with using conventional benchmarking applications forperformance optimization is that each benchmarking application is customdesigned to measure the run-time performance of a specific application.There is no standard benchmark application (e.g., no applicationagnostic benchmark application) which can be commonly utilized todetermine performance for different types of applications, as eachbenchmark application is custom designed to run specific test proceduresfor a specific application.

SUMMARY

Embodiments of the invention include methods for determining therun-time performance of an application executing on a computing systemwith low impact on the performance of the computing system. For example,in one embodiment, a time series telemetry data stream is obtained foreach of a plurality of key performance indicators during run-timeexecution of the application on a computing system having a given systemconfiguration. One or more statistical features are extracted from eachtime series telemetry data stream. Model parameters of a machinelearning performance score model are populated with values of theextracted statistical features. A run-time performance score of theapplication is then determined using the model parameters of the machinelearning performance score model populated with the values of theextracted statistical features.

Other embodiments of the invention include, without limitation,computing systems and articles of manufacture comprisingprocessor-readable storage media for determining the run-timeperformance of an application executing on a computing system with lowimpact on the performance of the computing system.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically illustrates a system for determining the run-timeperformance of an application executing on a computing system, accordingto an embodiment of the invention.

FIG. 2 is a flow diagram of a method for training an agnostic machinelearning performance score model which is utilized for determining therun-time performance of applications, according to an embodiment of theinvention.

FIGS. 3A, 3B, and 3C illustrate model parameters of agnostic machinelearning models that are used for run-time determination of storageperformance metrics, according to exemplary embodiments of theinvention.

FIG. 4 is a flow diagram of a method for run-time determination ofstorage performance metrics of a given application, according to anembodiment of the invention.

DETAILED DESCRIPTION

Embodiments of the invention will be described herein with regard tosystems and methods for determining the run-time performance of anapplication executing on a computing system with low impact on theperformance of the computing system. For example, FIG. 1 schematicallyillustrates a system 100 for determining the run-time performance of anapplication executing on a computing system, according to an embodimentof the invention. In particular, the system 100 comprises a serviceprovider computing system 110, a communications network 120, acloud-based storage system 130, and an end user computing system 140.The service provider computing system 110 comprises benchmarking tools111, telemetry data instrumentation and application programminginterfaces (APIs) 112, a database of training data 113, and a machinelearning model training module 114. The database of training data 113comprises a controlled dataset of labeled training data comprisingbenchmark performance 111-1 and time series telemetry data 112-1. Themachine learning model training module 114 utilizes the controlleddataset of labeled training data 111-1 and 112-1 in the database 113 togenerate a machine learning performance score model 115, using methodsas discussed in further detail below.

The end user computing system 140 (e.g., computer) comprises processors141, system memory 142, a storage interface 143, a data storage system144, a network interface 145, virtualization resources 146, telemetrydata instrumentation 147, an operating system 148, and a configurationoptimization tool 150. The operating system 148 comprises, for example,a file system 148-1 and telemetry data APIs 148-2. The configurationoptimization tool 150 comprises a performance score determination module160 and a performance score and system configuration analysis module170. The performance score determination module 160 comprises atelemetry data feature extraction module 162, a machine learningperformance score model 164, and a database of performance scores 166.

The end user computing system 140 can access the service providercomputing system 110 and the cloud-based storage system 130 over thecommunications network 120. The communications network 120 may comprise,for example, a global computer network (e.g., the Internet), a wide areanetwork (WAN), a local area network (LAN), a satellite network, atelephone or cable network, a cellular network, a wireless network suchas Wi-Fi (Wireless Fidelity) or WiMAX (Worldwide Interoperability forMicrowave Access), or various portions or combinations of these andother types of networks. The term “network” as used herein is thereforeintended to be broadly construed so as to encompass a wide variety ofdifferent network arrangements, including combinations of multiplenetworks possibly of different types. The cloud-based data storagesystem 130 comprises one or more remote data storage sites that aremanaged by one or more hosting companies to provide cloud-based storageservices to remotely store user data.

The service provider computing system 110 is utilized by the serviceprovider to perform offline processing to develop and periodicallyupdate the machine learning performance score model 115. The machinelearning performance score model 115 is downloaded to the end usercomputing system 140 and utilized by the configuration optimization tool150 to determine the run-time performance of an application executing onthe end user computing system 140. The service provider performs aseries of controlled experiments to collect the controlled dataset oftraining data which is stored in the database of training data 113 andutilized by the machine learning model training module 114 for trainingthe machine learning performance score model 115. For example, in oneexemplary embodiment, the benchmarking tools 111 are utilized to collecttraining data which comprises the benchmark performance scores 111-1,and the telemetry data instrumentation and APIs 112 are utilized tocollect training data which comprises the time series telemetry data112-1 for various key performance indicators (KPIs). The corpus ofbenchmark performance scores 111-1 and time series telemetry data 112-1comprises a large and diverse set of training data that providesinformation regarding the performance of various types of applicationsexecuting on a wide variety of different machines (e.g., different typesof computers with different operating systems, storage resources,processor resources, memory resources, graphics resources, etc.), aswell as the various types of applications executing on the same machinewith different configurations of the machine resources such as storageresources, processor resources, memory resources, graphics resources,etc.

In particular, in one exemplary embodiment, the benchmarking tools 111comprise benchmark applications that are utilized to perform multiplerun-time tests on applications to measure and score the performance ofsuch applications for various key performance indicators, and store theresults as labeled benchmark performance scores 111-1 in the database oftraining data 113. The benchmark performance scores 111-1 can becollected for different aspects of a given machine configurationincluding, e.g., storage performance, CPU (central processing unit)performance, memory performance, graphics performance, etc. In oneexemplary embodiment, in the context of end-to-end storage performance,the benchmarking tools 111 are utilized to perform storage benchmarkingtests to obtain storage performance scores with regard to storageperformance metrics such as input/output (IO) operations per second(IOPS), bandwidth, and latency.

The benchmarking tools 111 comprise custom benchmark applications thatare specifically designed for the applications being tested. Forexample, while there are various types of computer-aided design (CAD)applications (e.g., AutoCAD, Solidworks, 3Dmax, etc.) each CADapplication will have a custom benchmark application specificallydesigned for the given CAD application. A given benchmark applicationwill execute on the same machine as a given application to performvarious tests on the given application and determine performance scoresfor different run-time behaviors of the given application executing onthe given machine with a given configuration.

The telemetry data instrumentation and APIs 112 include, for example KPIapplication programming interfaces of a native operating system,dedicated telemetry data instrumentation, telemetry data that iscaptured and reported by hardware components, etc. In particular, as isknown in the art, operating systems such as Windows and Linux includeperformance counters that can be configured and utilized to measurevarious parameters associated with executing processes for the purposeof monitoring system performance. In addition, telemetry data can beacquired using dedicated software instrumentation tools to monitor andmeasure various parameters and metrics with regard to system performanceduring the run-time execution of workloads. Further, hardware componentsand associated hardware drivers typically include performance countersthat collect statistics regarding device-level operation. In someembodiments, the time series telemetry data 112-1 is collected for agiven application concurrently with performing the benchmark testing ofthe given application to collect benchmark performance scores 111-1 forthe given application.

The machine learning model training module 114 is configured to extractstatistical features from the training corpus of time series telemetrydata 112-1 and utilize the extracted statistical features of the timeseries telemetry data 112-1 together with the training corpus ofbenchmark performance scores 111-1 to train parameters of the machinelearning performance score model 115. More specifically, in an exemplaryembodiment, the machine learning model training module 114 comprises afeature extraction module that is configured to process the time seriestelemetry data 112-1 and extract a set of statistical features from thetime series telemetry data 112-1 for each KPI that is measured. Forexample, in some embodiments, as explained in further detail below, thestatistical features that are extracted from the time series telemetrydata for each KPI comprise various descriptive statistics or summarystatistics such as, e.g., mean, standard deviation, variance, etc.

The machine learning model training module 114 comprises methods thatare configured to train the machine learning performance score model 115using, for example, any suitable statistical analysis technique todetermine the magnitude of the relationship between the extractedfeatures of the KPI time series telemetry data streams and the targetstorage performance metrics (e.g., IOPS, read/write bandwidth,read/write latency, etc.). In addition, the machine learning modeltraining module 114 utilizes such relationship information to learn aset of features (e.g., select a subset of features from the trainingdata) that can be utilized as model parameters of an agnostic machinelearning performance score model which is configured to predictperformance scores for the target storage performance metrics overvarious types of workloads and system configurations. While the timeseries telemetry data 112-1 can include data collected for hundreds(e.g., 300 or more) KPIs, the model training process is configured todetermine which types of KPI telemetry data are most relevant for use inbuilding a performance score model that can predict a set of targetperformance metrics for a given functionality, e.g., predict read/writelatency, read/write bandwidth, and IOPS performance scores fordetermining end-to-end storage performance of an application executingon a machine. Various techniques according to embodiments of theinvention for obtaining and utilizing a controlled training dataset111-1 and 112-1 to train the machine learning performance score model115 will be described in further detail below in conjunction with FIGS.2, 3A, 3B, and 3C.

The service provider computing system 110 will periodically download(e.g., push) a most current version of the machine learning performancescore model 115 to the end user computing system 140 for use by theconfiguration optimization tool 150. As illustrated in FIG. 1, themachine learning performance score module 164 on the end user computingsystem 140 represents a previously downloaded version of the machinelearning performance score model 115 as generated and provided by theservice provider. The configuration optimization tool 150 utilizes themachine learning performance score module 164 to determine run-timeperformance scores (e.g., end-to-end storage performance scores) of anapplication executing on the end user computing system 140. In addition,the configuration optimization tool 150 utilizes the run-timeperformance scores to automatically optimize the configuration of thecomputing system 140 or otherwise provide smart recommendations to theend user for modifying the configuration of the computing system 140 toalter the run-time performance of the application executing on thecomputing system 140 with regard to, e.g., storage performance, andachieve an increased level or optimal level of run-time performance ofthe application.

More specifically, during run-time execution of a given application onthe computing system 140, the telemetry data instrumentation 147 and/ortelemetry APIs 148-2 of the operating system 148 (e.g., performancecounters) will generate time series telemetry data for various KPIsrelated to resource utilization of the computing system 140, e.g.,storage utilization, CPU utilization, memory utilization, graphicsutilization, virtual resource utilization, etc. The performance scoredetermination module 160 processes the KPI time series telemetry datafor a given set of KPIs to determine one or more run-time performancescores for the given application executing on the computing system 140.While the telemetry data instrumentation 147 and/or telemetry APIs 148-2will generate time series telemetry data for various KPIs (e.g.,hundreds of KPIs), the performance score determination module 160 willutilize a predefined subset of the KPI telemetry data on which themachine learning performance score model 164 was trained. For example,in the context of storage performance, the performance scoredetermination module 160 will acquire and store the KPI telemetry datawhich is deemed relevant for predicting read/write latency, read/writebandwidth, IOPS, etc., using the machine learning performance scoremodule 164.

The telemetry data feature extraction module 162 implements methods asdiscussed herein to extract statistical features (e.g., summarystatistics) from the time series telemetry data for each of the KPIswithin the predefined subset of KPIs. The performance scoredetermination module 160 applies the extracted features to the machinelearning performance score module 164 to compute one or more performancescores for the running application. The performance score(s) for thegiven application are persistently stored in the database of performancescores 166, wherein the determined performance score(s) for the givenapplication are mapped to the current configuration of the computingsystem 140. Details regarding operating modes of the configurationoptimization tool 150 according to exemplary embodiments of theinvention will be discussed in further detail below in conjunction withFIG. 4.

Over time, the performance score determination module 160 can determineand store run-time performance scores for a given application fordifferent system configuration settings of the computing system 140. Inthis regard, the database of performance scores 166 can persistentlymaintain a vast amount of run-time performance score data and associatedsystem configuration settings information for each of a plurality ofapplications that execute on the computing system 140. The performancescore and system configuration analysis module 170 implements methodsfor processing the information contained in the database of performancescores 166 to determine an optimal system configuration for a givenapplication.

For example, in an exemplary embodiment, the performance score andsystem configuration analysis module 170 is configured to compare thedifferent sets of performance scores and associated systemconfigurations in the database 166 for a given application to identifyone or more optimal system configurations which have been previouslydetermined to achieve the highest run-time performance scores for thegiven application. In one embodiment, when a given application islaunched on the computing system 140, the performance score and systemconfiguration analysis module 170 can utilize the performance scoreinformation contained in the database of performance scores 166 toautomatically determine a target system configuration which achieves thehighest run-time performance score for the given application, and thenautomatically adjust one or more resource configuration settings of thecomputing system 140 to obtain the target system configuration and,thus, increase or optimize the run-time performance of the applicationexecuting on the computing system 140.

In this regard, in the context of end-to-end storage performance of agiven application, the performance score and system configurationanalysis module 170 can characterize the before and after run-timestorage performance that is achieved for the given application based onconfiguration changes of the computing system 140. For example, given anapplication and two different system configurations and associatedrun-time performance scores, the performance score and systemconfiguration analysis module 170 can be configured to determine a scoreor measure which indicates an increase or decrease in the run-timestorage performance of the given application that is obtained fromswitching between the two different system configurations.

In another exemplary embodiment, the performance score and systemconfiguration analysis module 170 is configured to provide smartrecommendations to an end user for modifying one or more systemconfiguration settings of the computing system 140 to alter the run-timeperformance of the given application executing on the computing system140 and achieve an increased level or optimal level of run-timeperformance of the application. For example, the end user can issue aquery to the configuration optimization tool 150 to request informationregarding system configuration settings that would achieve an optimalrun-time performance of a given application which the end user intendsto execute on the computing system 140, allowing the user to eithermanually adjust one or more system resource configuration settings orotherwise command the performance score and system configurationanalysis module 170 to optimize the system configuration based on thesmart recommendations provided by the performance score and systemconfiguration analysis module 170.

In another exemplary embodiment, during run-time execution of a givenapplication, the performance score and system configuration analysismodule 170 can be configured to analyze or otherwise compare currentrun-time performance scores (which are determined for the givenapplication for a current system configuration of the computing system140) against previously determined run-time performance scores of thegiven application for different system configurations (as contained inthe database of performance scores 166) and then based on the results ofsuch analysis/comparison, automatically determine whether to (i)maintain the current system configuration for the given application or(ii) adjust one or more resource configuration settings to alter therun-time performance of the application executing on the computingsystem 140 and achieve an increased level or optimal level of run-timeperformance of the application. In this instance, the performance scoreand system configuration analysis module 170 can be configured toautomatically adjust one or more system resource configuration settingsto optimize the run-time performance of the application, or otherwiseprompt the user with a notification to recommend adjustments that can bemade to one or more system resource configurations setting to enhancethe run-time performance of the given application.

As shown in FIG. 1, the system configuration of the end user computingsystem 140 is based on the types of resources used and the resourceconfiguration settings for, e.g., the processors 141, the system memory142, the storage interface 143, the data storage system 144, the networkinterface 145, the virtualization resources 146, and the file system148-1. In particular, the processors 141 comprise one or more types ofhardware processors that are configured to process program instructionsand data to execute the native operating system 148 and applicationsthat run on the computing system 140. For example, the processors 141may comprise one or more central processing units (CPUs), amicroprocessor, a microcontroller, an application-specific integratedcircuit (ASIC), a field programmable gate array (FPGA), and other typesof processors, as well as portions or combinations of such processors.The term “processor” as used herein is intended to be broadly construedso as to include any type of processor that performs processingfunctions based on software, hardware, firmware, etc. For example, a“processor” is broadly construed so as to encompass all types ofhardware processors including, for example, (i) general purposeprocessors which comprise “performance cores” (e.g., low latency cores),and (ii) workload-optimized processors, which comprise any possiblecombination of multiple “throughput cores” and/or multiplehardware-based accelerators. Examples of workload-optimized processorsinclude, for example, graphics processing units (GPUs), digital signalprocessors (DSPs), system-on-chip (SoC), application-specific integratedcircuits (ASICs), and field programmable gate array (FPGAs), and othertypes of specialized processors or coprocessors that are configured toexecute one or more fixed functions. The term “hardware accelerator”broadly refers to any hardware that performs “hardware acceleration” toperform certain functions faster and more efficient than is possible forexecuting such functions in software running on a more general-purposeprocessor.

The system memory 142 comprises various types of memory such as volatilerandom-access memory (RAM), non-volatile random-access memory (NVRAM),or other types of memory, in any combination. The term “memory” or“system memory” as used herein refers to volatile and/or non-volatilememory which is utilized to store application program instructions thatare read and processed by the processors 141 to execute the nativeoperating system 148 and other applications on the computing system 140,and to temporarily store data that is utilized and/or generated by thenative operating system 148 and application programs running on thecomputing system 140. For example, the volatile memory may be a dynamicrandom-access memory (DRAM) (e.g., DRAM DIMM (Dual In-line MemoryModule), or other forms of volatile RAM. The system memory may comprisenon-volatile memory that is configured and accessible as a memoryresource. For example, the non-volatile system memory may be one or moreof a NAND Flash storage device, an SSD device, or other types of nextgeneration non-volatile memory (NGNVM) devices.

In addition, the system memory 142 is utilized for configuring cachememory. The configuration of the cache memory can have a significantimpact on the storage performance of a running application. For example,the parameters and operating modes of the cache memory of the computingsystem 140, e.g., cache size, write-through mode, write-back mode, etc.,can be configured to optimize application performance.

The storage interface 143 enables the processors 141 to interface andcommunicate with the system memory 142, and other local storage andoff-infrastructure storage media (e.g., data storage system 144), usingone or more standard communication and/or storage control protocols toread data from or write data to volatile and non-volatile memory/storagedevices. Such protocols include, but are not limited to, Non-VolatileMemory Express (NVMe), Peripheral Component Interconnect Express (PCIe),Parallel Advanced Technology Attachment (PATA), Serial ATA (SATA),Serial Attached SCSI Small Computer System Interface) (SAS), FibreChannel, etc.

The data storage system 144 may comprise any type of storage system thatis used for persistent storage of data. For example, the data storagesystem 144 may include storage devices such as hard disk drives (HDDs),flash memory devices, solid-state drive (SSD) devices, etc. The datastorage system 144 may be direct-attached storage (DAS), networkattached storage (NAS), etc.

The network interface 145 enables the computing system 140 to interfaceand communicate with a network and other system components (e.g.,could-based data storage system 130). For example, the network interface145 comprises network controllers such as network cards and resources(e.g., network interface controllers (NICs) (e.g. SmartNICs,RDMA-enabled NICs), Host Bus Adapter (HBA) cards, Host Channel Adapter(HCA) cards, input/output (I/O) adaptors, converged Ethernet adaptors,etc.) to support communication protocols and interfaces including, butnot limited to, PCIe, direct memory access (DMA) and remote directmemory access (RDMA) data transfer protocols, etc.

The file system 148-1 of the operating system 148 comprises a processthat manages the storage and organization of data that is stored onstorage media (e.g., HDDs, SSDs, etc.,) of the data storage system 144and the cloud-based data storage system 130. The type of file systemconfiguration (e.g. File Allocation Table (FAT) file system) that isutilized by the operating system 148 can affect the storage performanceof running applications. The storage of the cloud-based data storagesystem 130 (which is utilized by the computing system 140) will appearas volumes in the file system 148-1 of the operating system 148. In thisregard, since the cloud-based data storage system 130 and the networkinterface 145 are components of an end-to-end storage I/O path, the useof the cloud-based data storage system 130 and the configuration of thenetwork interface 145 for accessing the cloud-based data storage system130 can have an impact on the end-to-end storage performance of a givenapplication executing on the computing system 140.

The virtualization resources 146 can be instantiated to execute one ormore applications or functions which are hosted by the computing system140. For example, the virtualization resources 146 can be configured toimplement the various modules and functionalities of the configurationoptimization tool 150 or other applications that execute on thecomputing system 140 for which run-time performance scores aredetermined using the configuration optimization tool 150. In oneembodiment, the virtualization resources 146 comprise virtual machinesthat are implemented using a hypervisor platform which executes on thecomputing system 140, wherein one or more virtual machines can beinstantiated to execute functions of the computing system 140. As isknown in the art, virtual machines are logical processing elements thatmay be instantiated on one or more physical processing elements (e.g.,servers, computers, or other processing devices). That is, a “virtualmachine” generally refers to a software implementation of a machine(i.e., a computer) that executes programs in a manner similar to that ofa physical machine. Thus, different virtual machines can run differentoperating systems and multiple applications on the same physicalcomputer.

A hypervisor is an example of what is more generally referred to as“virtualization infrastructure.” The hypervisor runs on physicalinfrastructure, e.g., CPUs and/or storage devices, of the computingsystem 140, and emulates the CPUs, memory, hard disk, network and otherhardware resources of the host system, enabling multiple virtualmachines to share the resources. The hypervisor can emulate multiplevirtual hardware platforms that are isolated from each other, allowingvirtual machines to run, e.g., Linux and Windows Server operatingsystems on the same underlying physical host. An example of acommercially available hypervisor platform that may be used to implementone or more of the virtual machines in one or more embodiments of theinvention is the VMware® vSphere™ which may have an associated virtualinfrastructure management system such as the VMware® vCenter™. Theunderlying physical infrastructure may comprise one or more commerciallyavailable distributed processing platforms which are suitable for thetarget application.

In another embodiment, the virtualization resources 146 comprisecontainers such as Docker containers or other types of Linux containers(LXCs). As is known in the art, in a container-based applicationframework, each application container comprises a separate applicationand associated dependencies and other components to provide a completefile system, but shares the kernel functions of the host operatingsystem 148 with the other application containers. Each applicationcontainer executes as an isolated process in user space of the hostoperating system 148. In particular, a container system utilizes theunderlying operating system 148 that provides the basic services to allcontainerized applications using virtual-memory support for isolation.One or more containers can be instantiated to execute one or moreapplications or functions of the computing system 140 and theconfiguration optimization tool 150. In yet another embodiment,containers may be used in combination with other virtualizationinfrastructure such as virtual machines implemented using a hypervisor,wherein Docker containers or other types of LXCs are configured to runon virtual machines in a multi-tenant environment.

In one embodiment, the configuration optimization tool 150 comprisessoftware that is persistently stored in the local storage resources(e.g., data storage system 144) and loaded into the system memory 142,and executed by the processors 141 to perform respective functions asdescribed herein. In this regard, the system memory 142, and othermemory or storage resources as described herein, which have program codeand data tangibly embodied thereon, are examples of what is moregenerally referred to herein as “processor-readable storage media” thatstore executable program code of one or more software programs. Articlesof manufacture comprising such processor-readable storage media areconsidered embodiments of the invention. An article of manufacture maycomprise, for example, a storage device such as a storage disk, astorage array or an integrated circuit containing memory. The term“article of manufacture” as used herein should be understood to excludetransitory, propagating signals.

FIG. 2 is a flow diagram of a method for training an agnostic machinelearning performance score model which is utilized for determining therun-time performance of applications, according to an embodiment of theinvention. In particular, in one exemplary embodiment, FIG. 2illustrates offline processing that is performed by a service providerusing the computing system 110 of FIG. 1 to develop the machine learningperformance score model 115. As noted above, the service provider willperform a series of controlled experiments to collect the training datathat is used for training the machine learning performance score model115. In particular, in one exemplary embodiment, the offline processingcomprises performing a series of controlled experiments using predefinedworkloads (e.g., applications) and associated benchmarking tools tocollect labeled data which comprises (i) benchmark performance scoresand (ii) time series telemetry data for various key performanceindicators (block 200). The controlled experiments are performed tocollect a large amount of data for various types of machines, machineconfigurations, and workloads (e.g., applications). While the varioustechniques discussed herein can be utilized to determine the run-timeperformance of an application with respect to various functionalities,for illustrative purposes, exemplary embodiments of the invention willbe discussed in the context of methods for determining the run-timeend-to-end storage performance of applications for differentconfigurations of computing systems which execute the applications.

More specifically, in the context of modeling end-to-end storageperformance of applications, the benchmarking tools 111 are utilized toperform benchmark tests on a wide range of different types ofapplications to test, e.g., the run-time end-to-end storage performanceof the applications executing on a wide range of different types ofmachines (e.g., personal computers, server computers, etc.) anddifferent configurations (e.g., different types and configurations ofcaches, storage devices, storage controllers, storage interfaces, etc.)to collect a controlled dataset of benchmark performance scores 111-1.For example, in one exemplary embodiment, in the context of end-to-endstorage performance, the benchmarking tools 111 are utilized to performstorage benchmarking tests to obtain storage performance scores withregard to storage performance metrics such as IOPS, read bandwidth,write bandwidth, read latency, write latency, etc.

The IOPS metric is a storage performance measurement that denotes anamount of read or write operations that can be performed in one secondof time by a storage device or storage system such as hard disk drives(HDDs, solid-state drives (SSDs), and other types of storage devices orsystems (e.g., RAID storage system), etc. The bandwidth metric is astorage performance measurement with regard to how much throughput(e.g., megabytes/second (MB/s) a certain storage system/device canprovide. While a given storage device (e.g., HDD) can provide a maximumthroughput value, there are various factors which limit the ability toachieve the maximum throughput value during real-time operations.Depending on the operating system and the application/service that needsdisk access, it will issue a request to read or write a certain amountof data at the same time, which is referred to as IO size. The IO sizecould be, e.g., 4 KB (kilobytes), 8 KB, 32 KB, etc. The average IOsize×(IOPS)=throughput in MB/s.

The latency metric is a storage performance measurement that denotes anamount of time that it takes a storage device or system to complete anoperation. In particular, each I/O request will take some average timeto complete (e.g., average latency), which is measured in milliseconds(ms). An unduly high amount of storage latency for a given storagesystem has a direct and immediate impact on workloads running on thatstorage system. There are various factors which affect storage latency,some which are physical limits due to the mechanical constructs of,e.g., a standard hard disk. Latency takes into consideration factorssuch as, e.g., (i) an amount of time it takes a disk to spin around to atarget location on the disk, (ii) the amount of time it takes for thesystem to read or write the data from or to the disk, (iii) the amountof time it takes to transfer the data over a storage interface link,etc.

In addition, the controlled experiments are performed to acquire andrecord time series telemetry data for various KPIs (e.g., hundreds ofKPIs). For example, in the context of end-to-end storage performance,the time series telemetry data is acquired for various KPIs whichprovide information regarding the performances and behaviors of variouscomponents, interfaces, etc., in the end-to-end storage I/O path of theentire storage I/O stack. For example, the storage I/O stack includesstorage devices (e.g., HDDs, SSDs, etc.), drivers, hardware interfacesfor connecting host to storage (e.g., SATA, SAS, etc.), host cachememory, file system, access to cloud-based storage, etc.

As noted above, in the exemplary system embodiment of FIG. 1, the timeseries telemetry data is collected using the telemetry datainstrumentation and APIs 112 and stored as time-stamped telemetry data112-1 in the database of training data 113. In some embodiments, thetime series telemetry data 112-1 is collected for the same workloadsthat are executed by the benchmarking tools 111 when determining thebenchmark performance scores 111-1. In other words, the telemetry data112-1 is collected concurrently with the data that is collected andprocessed by the benchmarking tools 111 to compute the benchmarkperformance scores 111-1. In this regard, the training data providescorrespondence or correlation between the collected KPI telemetry data112-1 and the benchmark performance scores 111-1 for various performancemetrics (e.g., IOPS, bandwidth, and latency) as determined by thebenchmarking tools 111, when executing the same workloads. This allowsthe benchmark performance scores 111-1 for the given performance metrics(e.g., IOPS, bandwidth, and latency) to be utilized as a reference(e.g., for purposes of comparison) with regard the same performancemetrics as may be separately determined using the KPI telemetry data112-1.

Referring again to FIG. 2, following the acquisition of the trainingcorpus, a next stage of the machine learning model training process flowcomprises extracting statistical features from the time series telemetrydata (block 202). In particular, in one embodiment of the invention, theKPI telemetry data 112-1 is processed using a feature extraction processto extract a set of statistical features from each KPI telemetry datastream. For example, in one embodiment the statistical features for agiven KPI telemetry data stream are extracted by computing varioussummary statistics from the data sample values of the time seriestelemetry data stream, wherein the summary statistics include, forexample: (i) mean, (ii) standard deviation (std), (iii) minimum (Min),(iv) maximum (Max), (v) 25^(th) percentile, (vi) 50^(th) percentile, and(vii) 75^(th) percentile.

In particular, the mean is computed as the average of all data values ofthe given KPI time series telemetry data stream. The standard deviationof the data values V_(i) (for i=1, . . . , N) of the given KPI timeseries telemetry data stream comprising N data values is computed as:

${\sigma = \sqrt{\frac{1}{N}{\Sigma_{i}^{N}\left( {V_{i} - \mu} \right)}^{2}}},$where μ denotes a mean (or average) of all the data values V_(i) of thegiven KPI time series telemetry data stream.

Further, the percentile is a measure which indicates a data value belowwhich a given percentage of data values in the given KPI time seriestelemetry data stream falls. For example, the 25^(th) percentile is thedata value below which 25% of the data values of the given KPI timeseries telemetry data stream may be found. The Min and the Max featuresdenote the data values have the smallest value and the largest value,respectively, in the given KPI time series telemetry data stream.

Furthermore, for each KPI time series telemetry data stream (e.g., V₁,V₂, V₃, . . . , V_(N-1), V_(N)), a KPI time series delta (Δ) stream iscalculated as: Δ=[(V₂−V₁), (V₃−V₂), . . . (V_(N)−V_(N-1))]. In addition,each KPI time series delta (Δ) stream is characterized by computing thestatistical values of (i) mean, (ii) standard deviation (SD), (iii)minimum (Min), (iv) maximum (Max), (v) 25^(th) percentile, (vi) 50^(th)percentile, and (vii) 75^(th) percentile, as noted above. In thisregard, in one exemplary embodiment, fourteen (14) features areextracted for each KPI time series telemetry data stream. The KPI timeseries delta data streams and associated extracted features provideadditional information that enables the machine learning process tounderstand other behaviors of the time series data, e.g., how noisy thedata samples are.

Next, the training process continues by utilizing the training corpus ofbenchmark performance scores and the extracted statistical features ofthe KPI time series telemetry data streams to train the parameters of amachine learning performance score model (block 204). The model trainingprocess is performed using any suitable statistical analysis techniqueto determine the magnitude of the relationship between the extractedfeatures of the KPI time series telemetry data streams and the targetstorage performance metrics (e.g., IOPS, bandwidth, latency), andutilize such relationship information to learn a set of features (e.g.select a subset of features from the training data) that can be utilizedas model parameters of an agnostic machine learning performance scoremodel which is configured to predict performance scores for the targetstorage performance metrics over various types of workloads and systemconfigurations. In other words, the model training process results in amodel which learns a function ƒ such that ƒ(x) maps to y, wherein xdenotes the features (independent variables) that are used to predict y,and wherein y denotes a target performance metric (dependent variable).

In one exemplary embodiment, the dependencies/relationships between theextracted features of the KPI time series telemetry data streams and thetarget storage performance metrics (e.g., IOPS, bandwidth, latency) aremodeled using a linear regression function whose weights are estimatedfrom the training data. For example, as is well-known to those ofordinary skill in the art, linear regression comprises a supervisedmachine learning process where a predicted output is continuous and hasa constant slope. For example, linear regression implements standardslope-intercept form: y=mx+b, where y denotes a predicted value, where xdenotes an actual value (e.g., extracted feature from telemetry data),where m denotes a weight (e.g., coefficient), and where b denotes abias. The machine learning model training process is performed to builda linear regression function between the features of the KPI time seriestelemetry data and the target performance metrics (e.g., IOPS,bandwidth, latency, etc.), which can serve as a model that canaccurately predict storage performance scores for a workload duringrun-time execution of the workload using a set of extracted features oftelemetry data that is captured during run-time execution of theworkload.

The model training process is performed to learn the parameters that canbe used to compute performance scores for a plurality of target storageperformance metrics (e.g., IOPS, bandwidth, latency) given a set offeatures (x) extracted from time series telemetry data of a givenapplication during run-time of the application. There are variousmethods that can be utilized to learn the parameters of a linearregression model. In one embodiment, a statistical learning process isperformed by minimizing a cost function. A cost function is a measure ofan error of the given model in terms of its ability to estimate arelationship between the independent variables x (e.g., the features)and the dependent variable y (e.g., the storage performance metrics).The cost function is typically expressed as a difference or distancebetween the predicted value (y_(pred)) and the actual value (y_(true)).The cost function can be estimated by iteratively running the model tocompare estimated predictions of y (y_(pred)) against known values of y(y_(true)). The objective of a machine learning model, therefore, is tofind parameters, weights or a structure that minimizes the costfunction.

In one embodiment, a mean squared error (MSE) cost function is utilizedwhich measures the average squared difference between the actual value y(y_(true)) value and the predicted value y (y_(pred)). In particular,MSE is determined as:

${MSE} = {\frac{1}{N}{\sum\limits_{i = 1}^{N}\;\left( {y_{true} - y_{pred}} \right)^{2}}}$wherein N denotes the number of samples. The MSE computation results ina single value which represents the cost, or score, associated with acurrent set of weights. The goal is to minimize MSE value to improve theaccuracy of the machine learning model.

When using a MSE cost function of an iterative linear model, anefficient optimization process, such as a gradient descent process, isutilized to determine a minimum of the cost function. The gradientdescent process comprises a first-order iterative optimization algorithmwhich can be used to determine the minimum of a cost function. Thegradient descent process enables a machine leaning model to learn howthe model parameters should be adjusted to further reduce the costfunction, e.g., reduce errors (differences between actual y andpredicted y). As the machine learning model iterates, it graduallyconverges towards a minimum where further adjustments to the parametersproduce little or zero changes in the error (e.g., convergencecondition).

When using the MSE cost function as noted above, we are only consideringthe distance between the actual and the predicted values. In someembodiments, we would like to consider a portion of the distance fromthe actual value, which can increase the accuracy when using a metricthat takes into account a relative distance. For example, considerstorage latency metrics. For higher latency values, we allow theprediction to be at a larger distance from the actual value, while forlower latency values, we keep the prediction closer to the actual value.The following cost function (CF) takes into account such relativedistances:

${CF} = \frac{\sum\limits_{i = 1}^{N}\;\left( \frac{\left( {y_{pred} - y_{true}} \right)}{y_{true}} \right)^{2}}{2*N}$where N denotes the number of samples.

FIGS. 3A, 3B, and 3C illustrate model parameters of machine learningmodels for run-time determination of storage performance metrics,according to an embodiment of the invention. In particular, FIG. 3Aillustrates a storage performance model ReadLatAvg 300 comprising a setof learned model parameters 302 for predicting performance scores withregard to average latency for read operations. FIG. 3A furtherillustrates a storage performance model ReadBW 310 comprising a set oflearned model parameters 312 for predicting performance scores withregard to bandwidth for read operations. FIG. 3B illustrates a storageperformance model WriteIOPS 320 comprising a set of learned modelparameters 322 for predicting performance scores with regard to IOPS forwrite operations. FIG. 3B further illustrates a storage performancemodel WriteBW 330 comprising a set of learned model parameters 332 forpredicting performance scores with regard to bandwidth of writeoperations. FIG. 3C illustrates a storage performance model ReadIOPS 340comprising a set of learned model parameters 342 for predictingperformance scores with regard to IOPS for read operations. FIG. 3Cfurther illustrates a storage performance model WriteLatAvg 350comprising a set of learned model parameters 352 for predictingperformance scores with regard to average latency of write operations.

As shown in example embodiments of FIGS. 3A, 3B, and 3C, each set oflearned model parameters 302, 312, 322, 332, 342, and 352 comprises ten(10) features and associated coefficients that are learned for therespective storage performance models 300, 310, 320, 330, 340, and 350as a result of a machine learning process as discussed above. Eachfeature comprises a given KPI metric (e.g., a performance counter) and asummary statistic that specifies the statistical feature extractionoperation that is performed on the time series telemetry data of thegiven KPI metric to generate the given feature. The features shown inFIGS. 3A, 3B, and 3C include KPI metrics that are obtained using variousperformance counters such as system performance counters, physical diskperformance counters, logical disk performance counters, processperformance counters, and cache performance counters.

For example, the system performance counters include:

(i) System\Context Switches/sec.—a system performance counter whichindicates a combined rate at which all processors on the given computersystem are switched from one thread to another (e.g., context switchesoccur when a running thread voluntarily relinquishes the processor, ispreempted by a higher priority, ready thread, or switches betweenuser-mode and privileged (kernel) mode, etc.);

(ii) System\System Calls/sec.—a system performance counter whichindicates a combined rate of calls to operating system service routinesby all processes running on the given computer system;

(iii) System\File Data Operations/sec.—a system performance counterwhich indicates a combined rate of read and write operations on alllogical disks on the given computer system.

(iv) System\File Write Operations/sec.—a system performance counterwhich indicates a combined rate of file system write requests to alldevices on the given computer system, including requests to write todata in the file system cache (measured in number of write operationsper second);

(v) System\File Read Operations/sec.—a system performance counter whichindicates a combined rate of file system read requests to all devices onthe given computer system, including requests to read from the filesystem cache (measured in numbers of read operations per second);

(vi) System\File Write Bytes/sec.—a system performance counter whichindicates an overall rate at which bytes are written to satisfy filesystem write requests to all devices on the given computer system,including write operations to the file system cache (measured in numberof bytes per second); and

(vii) System\File Read Bytes/sec.—a system performance counter whichindicates an overall rate at which bytes are read to satisfy file systemread requests to all devices on the given computer system, includingread operations from the file system cache (measured in number of bytesper second).

The process performance counters monitor running application programsand system processes. For example, the process performance countersinclude:

(i) Process(X)\IO Read Bytes/sec.—a process performance counter whichindicates a rate at which a given process (X) is reading bytes from I/Ooperations of all I/O activity generated by the process (e.g., file,network and device I/O);

(ii) Process(X)\IO Read Operations/sec.—a process performance counterwhich indicates a rate at which the given process (X) is issuing readI/O operations for all I/O activity generated by the process (e.g.,file, network and device I/O);

(iii) Process(X)\IO Write Bytes/sec.—a process performance counter whichindicates a rate the given process (X) is writing bytes to I/Ooperations for all I/O activity generated by the given process (e.g.,file, network and device I/O);

(iv) Process (X)\IO Write Operations/sec.—a process performance counterwhich indicates a rate at which the given process (X) is issuing writeI/O operations for all I/O activity generated by the given process(e.g., file, network and device I/O);

(v) Process(X)\IO Data Bytes/sec.—a process performance counter whichindicates a rate at which the given process (X) is reading and writingbytes in I/O operations for all I/O activity generated by the givenprocess (e.g., file, network and device I/O); and

(vi) Process(X)\IO Data Operations/sec.—a process performance counterwhich indicates a rate at which the given process (X) is issuing readand write I/O operations for all I/O activity generated by the givenprocess (e.g., file, network and device I/O).

The physical disk performance counters comprise counters that monitorphysical drives (e.g., hard disk drives or solid-state drives) on agiven computer system. The values of physical disk counters are sums ofthe values of the logical disks (or partitions) into which they aredivided. For example, the physical disk performance counters include:

(i) PhysicalDisk(Total)\% Disk time—a physical disk performance counterwhich indicates a percentage of time that all physical disk drives ofthe given computer system are busy servicing read or write requests;

(ii) PhysicalDisk(Total)\Avg. Disk sec/Read—a physical disk performancecounter which indicates an average time, in seconds, of all data readsfrom all physical disk drives of the given computer system;

(iii) PhysicalDisk(Total)\Avg. Disk sec/Transfer—a physical diskperformance counter which indicates an average time, in seconds, of alldisk transfers of all physical disk drives of the given computer system;and

(iv) PhysicalDisk(_Total)\Avg. Disk sec/Write—a physical diskperformance counter which indicates an average time, in seconds, of alldata writes to all physical disk drives of the given computer system.

The logical disk performance counters comprises counters that monitorlogical partitions (or logical disks) of hard disk drives or solid-statedisk drives. For example, the logical disk performance counters include:

(i) LogicalDisk(_Total)\% Disk time—a logical disk performance counterwhich indicates a percentage of time that all logical disks of the givencomputer system are busy servicing read or write requests;

(ii) LogicalDisk(Total)\Avg. Disk sec/Read—a logical disk performancecounter which indicates an average time, in seconds, of all data readsfrom all logical disks of the given computer system;

(iii) LogicalDisk(_Total)\Avg. Disk sec/Transfer—a logical diskperformance counter which indicates an average time, in seconds, of alldisk transfers of all logical disks of the given computer system; and

(iv) LogicalDisk(_Total)\Avg. Disk sec/Write—a logical disk performancecounter which indicates an average time, in seconds, of all data writesto all logical disks of the given computer system.

The cache performance counters comprise counters that monitor a filesystem cache. Since a cache is typically utilized by applications, thecache can be monitored as an indicator of application I/O operations.For example, the cache performance counters include Cache\Async CopyReads/sec., which is a performance counter that indicates a rate atwhich read operations from pages of the file system cache involve amemory copy of the data from the cache to the application's buffer.

As noted above, each feature of the learned model parameters 302, 312,322, 332, 342, and 352 shown in FIGS. 3A, 3B and 3C comprises a summarystatistic (e.g., std, mean, max, 25%, 50%, 75%) that specifies thestatistical feature extraction operation that is performed on the timeseries telemetry data that is collected for the given KPI metricassociated with the given feature. For example, consider the set oflearned model parameters 322 for the exemplary storage performance modelWriteIOPS 320 shown in FIG. 3B. The model parameters 322 include a firstmodel parameter P1 which comprises a feature System\File DataOperations/sec_75% and a corresponding coefficient of −02.3508 (e.g.,weight). The value of this feature is computed as the 75^(th) percentileof a time series telemetry data stream of performance counter values forthe System\File Data Operations/sec. metric as noted above, which arecaptured over a period of time (e.g., data samples captured every 2seconds in a period of 10 minutes). The model parameter P1 is determinedby multiplying the determined value of the feature System\File DataOperations/sec_75% by the corresponding coefficient of −02.3508.

In one exemplary embodiment, the set of learned model parameters 302,312, 322, 332, 342, and 352 shown in FIGS. 3A, 3B and 3C are utilized toestablish a set of linear formulas that that are used to compute storageperformance scores for various storage metrics (e.g., IOPS, bandwidth,latency) for read and write operations. In one embodiment, theperformance score for a given storage performance metric is computed byadding the model parameter values (e.g., P1+P2+P3+P4+P5+P6+P7+P8+P9+P10)for the corresponding learned model. For example, based on the set ofmodel parameters 322 for the storage performance model WriteIOPS 320shown in FIG. 3B, a formula for predicting a run-time IOPS performancescore for write operations of a given application is as follows:

WriteIOPS =   [Percentile 75  (System ∖ File  Data  Operations/sec ) * −0.23507] +   [Percentile 25  (System ∖ File  Data  Operations/sec ) * 0.110803] +   [Percentile 50  (System ∖ File  Data  Operations/sec ) * 0.060986] +   [Percentile 25  (System ∖ System  Calls/sec ) * 0.031604] +   [Mean  (System ∖ File  Write  Operations/sec ) * 1.131512] +   [Mean  (Process(X) ∖ IO  Data  Operations/sec ) * −0.05654] +   [Percentile 75  (Process(X) ∖ IO  Data  Operations/sec ) * 0.288518] +   [Percentile 25  (Process(X) ∖ IO  Data  Operations/sec ) * −0.20186] +   [Percentile 50  (System ∖ System  Calls/sec ) * −0.03234] +   [Mean  (System ∖ File  Data  Operations/sec ) * −0.09694]

FIG. 4 is a flow diagram of a method for run-time determination ofstorage performance metrics of a given application, according to anembodiment of the invention. In particular, in one exemplary embodiment,FIG. 4 illustrates operations that are implemented by the performancescore determination module 160 of FIG. 1 during run-time execution of agiven application on the computing system 140. In particular, duringrun-time execution of the given application, the performance scoredetermination module 160 will obtain time series telemetry data streamsfor a predetermined set of KPIs (block 400). As noted above, thepredetermined set of KPI includes KPIs which correspond to modelparameters of the machine learning performance score model that is beingutilized to determine run-time performance scores for the givenapplication.

Next, the telemetry data feature extraction module 162 will extract oneor more statistical features from each KPI time series telemetry datastream (block 402). For example, as noted above, in one embodiment, thestatistical features are extracted by computing one or more summarystatistics from the time series telemetry data stream including, forexample, mean, standard deviation, Min, Max, 25^(th) percentile, 50^(th)percentile, and 75^(th) percentile. Again, the types of KPI data streamsand summary statistics that are utilized to determine performance scoresfor the given application will vary depending on the model parameters ofthe trained performance score model that is being utilized to computethe performance scores. The amount of time series telemetry data that isused to compute a given summary statistic will vary depending on theapplication. For example, in an exemplary embodiment where telemetrydata samples are collected every 2 seconds, the feature extractionprocess can be applied about every 10 minutes of recorded data samplesfor each of the KPI telemetry data streams.

The performance score determination module 160 will then populate theparameter values of the performance score model with the values of theextracted features (block 404). In particular, as discussed above and asillustrated in FIGS. 3A, 3B and 3C, each storage performance model 300,310, 320, 330, 340 and 350 comprises a respective set of learned modelparameters 302, 312, 322, 332, 342, and 352, wherein each modelparameter P comprises a feature which corresponds to a KPI metric andthe statistical feature extraction operation (e.g., summary statistic)that is performed on the time series telemetry data of the KPI metric(e.g., std, mean, max, 25%, 50%, 75%) to derive the feature value. Inthe exemplary embodiment, the parameter values of the storageperformance models 300, 310, 320, 330, 340 and 350 are populated withthe extracted feature values specified by the model parameters.

Next, the performance score determination module 160 will determine oneor more performance scores using the performance score model(s) with theparameters populated with the values of the extracted features (block406). For example, in one embodiment as discussed above, the performancescore for a given storage performance metric is computed by adding themodel parameter values (e.g., P1+P2+P3+P4+P5+P6+P7+P8+P9+P10) for thecorresponding learned model. The parameter value for a given modelparameter is determined by multiplying the extracted feature value bythe learned coefficient (e.g., weight) for that model parameter.

The performance score determination module 160 will store the determinedperformance scores for the given application in a persistent storage(e.g., database of performance scores 166, FIG. 1) for subsequent useand analysis by, e.g., the performance score and system configurationanalysis module 170 as discussed above (block 408). In the context ofrun-time storage performance of a given application, the performancescore determination module 160 can utilize the storage performancemodels 300, 310, 320, 330, 340 and 350 shown in FIGS. 3A-3C to determineperformance scores for average latency for read operations, bandwidthfor read operations, IOPS for write operations, bandwidth of writeoperations, IOPS for read operations, and average latency of writeoperations. These storage performance scores can be periodicallydetermined during run-time operation of the application (e.g., every 10minutes) over the course of a few hours, days, etc., and persistentlystored in a data structure which maps the performance scores to a givenconfiguration of the computing system. The overall performance score fora given storage performance metric can be determined from an average ofall the performance scores determined for the given storage performancemetric over a given period of time (e.g., hours).

It is to be appreciated that exemplary systems and methods as describedherein allow for determining the run-time performance of an applicationexecuting on a computing system with low impact on the performance ofthe computing system. For example, the techniques disclosed hereineliminate the need for end-users to execute custom designed benchmarkapplications on their machines to determine the run-time performance ofapplications. Such custom designed benchmark applications prevent usersfrom actually using the applications during benchmark applicationtesting, and the execution of the benchmark application adds additionalworkloads which stresses the machine in a way that can adversely impactthe performance of other processes or applications that the end user iscurrently executing on the given machine.

Furthermore, the performance score models that are used to computerun-time performance scores comprise light-weight, agnostic statisticalmodels that are trained to predict performance scores for a wide varietyof applications and over a wide variety of system configurations andarchitectures. The performance score models as described herein aregeneralizable, as such models simply use telemetry data of a runningapplication in order to determine and report performance scores. Notasks and no loads are needed that are custom tailored to theapplication, thereby allowing the performance score models to beutilized for any unknown application. Furthermore, during run-timeoperation, not all available KPIs need to be recorded and analyzed, aseach performance score model utilizes a much smaller subset of theavailable run-time KPI telemetry data streams (e.g., 10 to 30 KPIs) tocompute run-time performance scores for applications executing on agiven computing system. Moreover, the techniques for determiningperformance scores using pre-trained statistical models providelightweight solutions for determining run-time performance scoreswithout impacting machine performance.

It is to be understood that the above-described embodiments of theinvention are presented for purposes of illustration only. Manyvariations may be made in the particular arrangements shown. Forexample, although described in the context of particular system anddevice configurations, the techniques are applicable to a wide varietyof other types of information processing systems, computing systems,data storage systems, processing devices and distributed virtualinfrastructure arrangements. In addition, any simplifying assumptionsmade above in the course of describing the illustrative embodimentsshould also be viewed as exemplary rather than as requirements orlimitations of the invention. Numerous other alternative embodimentswithin the scope of the appended claims will be readily apparent tothose skilled in the art.

What is claimed is:
 1. A method, comprising: performing, by a systemconfiguration tool executing on a computing system, an automated processto determine a run-time performance of an application executing on thecomputing system; wherein performing the automated process comprises:obtaining a time series telemetry data stream for each of a plurality ofkey performance indicators corresponding to utilization of resources ofthe computing system by the application during run-time execution of theapplication on the computing system with the computing system having afirst system configuration; populating model parameters of a trainedmachine learning performance score model with parameter values that aredetermined based on the obtained time series telemetry data streams,wherein at least one model parameter comprises a learned weight value,and a learned feature which specifies (i) an associated key performanceindicator, and (ii) an associated statistical feature extractionoperation to perform on a time series telemetry data stream associatedwith the given key performance indicator to compute a value for thelearned feature, wherein a parameter value of the at least one modelparameter is computed by applying the associated statistical featureextraction operation to at least a portion of the obtained time seriestelemetry data stream associated with the given key performanceindicator to determine the value of the learned feature, and applyingthe learned weight value to the determined value of the learned feature;determining a run-time performance score of the application using themodel parameters of the trained machine learning performance score modelpopulated with the parameter values based on the obtained time seriestelemetry data streams; and in response to the determined run-timeperformance score of the application, automatically determining whetherto one of: (i) maintain the first system configuration and (ii) adjustat least one resource configuration setting of the computing system toalter a run-time performance of the application, based on the determinedrun-time performance score of the application; wherein the trainedmachine learning performance score model comprises one or more storageperformance models in which model parameters of the one or more storageperformance models are trained to determine one or more storageperformance metrics.
 2. The method of claim 1, wherein the automatedprocess further comprises persistently storing the determined run-timeperformance score of the application in a data structure that maps thedetermined run-time performance score to information regarding the firstsystem configuration of the computing system.
 3. The method of claim 1,wherein the trained machine learning performance score model comprisesan application agnostic and system configuration agnostic machinelearning performance score model, and wherein the one or more storageperformance metrics comprise one or more of: (i) average latency forread operations; (ii) average latency of write operations; (iii)bandwidth for read operations; (iv) bandwidth of write operations; (v)input/output operations per second for write operations; and (vi)input/output operations per second for read operations.
 4. The method ofclaim 1, wherein applying the associated statistical feature extractionoperation to at least a portion of the obtained time series telemetrydata stream associated with the given key performance indicator,comprises: computing summary statistics on telemetry data sample valuesof the obtained time series telemetry data stream associated with thegiven key performance indicator; wherein the summary statistics compriseone or more of: (i) mean; (ii) standard deviation; (iii) minimum; (iv)maximum; (v) 25^(th) percentile; (vi) 50^(th) percentile; and (vii)75^(th) percentile.
 5. The method of claim 1, wherein determining therun-time performance score of the application using the model parametersof the trained machine learning performance score model populated withthe parameter values based on the obtained time series telemetry datastreams, comprises: computing a parameter value for each model parameterof the trained machine learning performance score model; and determininga sum of the computed parameter values of the model parameters of thetrained machine learning performance score model; wherein the determinedsum corresponds to a performance score for a given storage performancemetric defined by the trained machine learning performance score model.6. The method of claim 1, wherein the time series telemetry data streamsfor the plurality of key performance indicators are obtained using oneor more of performance counters that execute on the computing system,wherein the performance counters comprise one or more of: (i) systemperformance counters; (ii) physical disk performance counters; (iii)logical disk performance counters; (iv) process performance counters;and (v) cache performance counters.
 7. The method of claim 1, whereinthe automated process further comprises: determining a second run-timeperformance score of the application using the model parameters of thetrained machine learning performance score model populated with valuesof extracted statistical features obtained during run-time execution ofthe application on the computing system with the computing system havinga second system configuration, which is different from the first systemconfiguration; and comparing the second run-time performance score witha previously determined run-time performance score of the applicationexecuting on the computing system having the first system configuration;and determining which of the first and second system configurations ofthe computing system provide a greater run-time performance of theapplication based on said comparison of the run-time performance scores.8. An article of manufacture comprising a non-transitoryprocessor-readable storage medium having stored therein program code ofone or more software programs, wherein the program code is executable byone or more processors to perform a method which comprises: performing,by a system configuration tool executing on a computing system, anautomated process to determine a run-time performance of an applicationexecuting on the computing system; wherein performing the automatedprocess comprises: obtaining a time series telemetry data stream foreach of a plurality of key performance indicators corresponding toutilization of resources of the computing system by the applicationduring run-time execution of the application on the computing systemwith the computing system having a first system configuration;populating model parameters of a trained machine learning performancescore model with parameter values that are determined based on theobtained time series telemetry data streams, wherein at least one modelparameter comprises a learned weight value, and a learned feature whichspecifies (i) an associated key performance indicator, and (ii) anassociated statistical feature extraction operation to perform on a timeseries telemetry data stream associated with the given key performanceindicator to compute a value for the learned feature, wherein aparameter value of the at least one model parameter is computed byapplying the associated statistical feature extraction operation to atleast a portion of the obtained time series telemetry data streamassociated with the given key performance indicator to determine thevalue of the learned feature, and applying the learned weight value tothe determined value of the learned feature; determining a run-timeperformance score of the application using the model parameters of thetrained machine learning performance score model populated with theparameter values based on the obtained time series telemetry datastreams; and in response to the determined run-time performance score ofthe application, automatically determining whether to one of: (i)maintain the first system configuration and (ii) adjust at least oneresource configuration setting of the computing system to alter arun-time performance of the application, based on the determinedrun-time performance score of the application; wherein the trainedmachine learning performance score model comprises one or more storageperformance models in which model parameters of the one or more storageperformance models are trained to determine one or more storageperformance metrics.
 9. The article of manufacture of claim 8, furthercomprising program code that is executable by the one or more processorsto perform a method comprising persistently storing the determinedrun-time performance score of the application in a data structure thatmaps the determined run-time performance score to information regardingthe first system configuration of the computing system.
 10. The articleof manufacture of claim 8, wherein the trained machine learningperformance score model comprises an application agnostic and systemconfiguration agnostic machine learning performance score model, andwherein the one or more storage performance metrics comprise one or moreof: (i) average latency for read operations; (ii) average latency ofwrite operations; (iii) bandwidth for read operations; (iv) bandwidth ofwrite operations; (v) input/output operations per second for writeoperations; and (vi) input/output operations per second for readoperations.
 11. The article of manufacture of claim 8, wherein theprogram code for applying the associated statistical feature extractionoperation to at least a portion of the obtained time series telemetrydata stream associated with the given key performance indicator,comprises program code for: computing summary statistics on telemetrydata sample values of the obtained time series telemetry data streamassociated with the given key performance indicator; wherein the summarystatistics comprise one or more of: (i) mean; (ii) standard deviation;(iii) minimum; (iv) maximum; (v) 25^(th) percentile; (vi) 50^(th)percentile; and (vii) 75^(th) percentile.
 12. The article of manufactureof claim 8, wherein the program code for determining the run-timeperformance score of the application using the model parameters of thetrained machine learning performance score model populated with theparameter values based on the obtained time series telemetry datastreams, comprises program code for: computing a parameter value foreach model parameter of the trained machine learning performance scoremodel; and determining a sum of the computed parameter values of themodel parameters of the trained machine learning performance scoremodel; wherein the determined sum corresponds to a performance score fora given storage performance metric defined by the trained machinelearning performance score model.
 13. The article of manufacture ofclaim 8, wherein the time series telemetry data streams for theplurality of key performance indicators are obtained using one or moreof performance counters that execute on the computing system, whereinthe performance counters comprise one or more of: (i) system performancecounters; (ii) physical disk performance counters; (iii) logical diskperformance counters; (iv) process performance counters; and (v) cacheperformance counters.
 14. A computing system, comprising: memory tostore software instructions; one or more processors that execute thesoftware instructions to instantiate a system configuration tool whichis configured to perform an automated process to determine a run-timeperformance of an application executing on the computing system, whereinthe automated process comprises: obtaining a time series telemetry datastream for each of a plurality of key performance indicatorscorresponding to utilization of resources of the computing system by theapplication during run-time execution of the application on thecomputing system with the computing system having a first systemconfiguration; populating model parameters of a trained machine learningperformance score model with parameter values that are determined basedon the obtained time series telemetry data streams, wherein at least onemodel parameter comprises a learned weight value, and a learned featurewhich specifies (i) an associated key performance indicator, and (ii) anassociated statistical feature extraction operation to perform on a timeseries telemetry data stream associated with the given key performanceindicator to compute a value for the learned feature, wherein aparameter value of the at least one model parameter is computed byapplying the associated statistical feature extraction operation to atleast a portion of the obtained time series telemetry data streamassociated with the given key performance indicator to determine thevalue of the learned feature, and applying the learned weight value tothe determined value of the learned feature; determining a run-timeperformance score of the application using the model parameters of thetrained machine learning performance score model populated with theparameter values based on the obtained time series telemetry datastreams; and in response to the determined run-time performance score ofthe application, automatically determining whether to one of: (i)maintain the first system configuration and (ii) adjust at least oneresource configuration setting of the computing system to alter arun-time performance of the application, based on the determinedrun-time performance score of the application; wherein the trainedmachine learning performance score model comprises one or more storageperformance models in which model parameters of the one or more storageperformance models are trained to determine one or more storageperformance metrics.
 15. The computing system of claim 14, wherein thetrained machine learning performance score model comprises anapplication agnostic and system configuration agnostic machine learningperformance score model, and wherein the one or more storage performancemetrics comprise one or more of: (i) average latency for readoperations; (ii) average latency of write operations; (iii) bandwidthfor read operations; (iv) bandwidth of write operations; (v)input/output operations per second for write operations; and (vi)input/output operations per second for read operations.
 16. Thecomputing system of claim 14, wherein applying the associatedstatistical feature extraction operation to at least a portion of theobtained time series telemetry data stream associated with the given keyperformance indicator, comprises: computing summary statistics ontelemetry data sample values of the obtained time series telemetry datastream associated with the given key performance indicator; wherein thesummary statistics comprise one or more of: (i) mean; (ii) standarddeviation; (iii) minimum; (iv) maximum; (v) 25^(th) percentile; (vi)50^(th) percentile; and (vii) 75^(th) percentile.
 17. The computingsystem of claim 14, determining the run-time performance score of theapplication using the model parameters of the trained machine learningperformance score model populated with the parameter values based on theobtained time series telemetry data streams, comprises: computing aparameter value for each model parameter of the trained machine learningperformance score model; and determining a sum of the computed parametervalues of the model parameters of the trained machine learningperformance score model; wherein the determined sum corresponds to aperformance score for a given storage performance metric defined by thetrained machine learning performance score model.
 18. The computingsystem of claim 14, wherein the time series telemetry data streams forthe plurality of key performance indicators are obtained using one ormore of performance counters that execute on the computing system,wherein the performance counters comprise one or more of: (i) systemperformance counters; (ii) physical disk performance counters; (iii)logical disk performance counters; (iv) process performance counters;and (v) cache performance counters.
 19. The computing system of claim14, wherein the automated process further comprises: determining asecond run-time performance score of the application using the modelparameters of the trained machine learning performance score modelpopulated with values of extracted statistical features obtained duringrun-time execution of the application on the computing system with thecomputing system having a second system configuration, which isdifferent from the first system configuration; and comparing the secondrun-time performance score with a previously determined run-timeperformance score of the application executing on the computing systemhaving the first system configuration; and determining which of thefirst and second system configurations of the computing system provide agreater run-time performance of the application based on said comparisonof the run-time performance scores.